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Abstract 

Background: Understanding polyphenism, the ability of a single genome to express multiple morphologically 
and behaviourally distinct phenotypes, is an important goal for evolutionary and developmental biology. 
Polyphenism has been key to the evolution of the Hymenoptera, and particularly the social Hymenoptera where 
the genome of a single species regulates distinct larval stages, sexual dimorphism and physical castes within the 
female sex. Transcriptomic analyses of social Hymenoptera will therefore provide unique insights into how 
changes in gene expression underlie such complexity. Here we describe gene expression in individual 
specimens of the pre-adult stages, sexes and castes of the key pollinator, the buff-tailed bumblebee Bombus 
terrestris. 

Results: cDNA was prepared from mRNA from five life cycle stages (one larva, one pupa, one male, one gyne 
and two workers) and a total of 1,610,742 expressed sequence tags (ESTs) were generated using Roche 454 
technology, substantially increasing the sequence data available for this important species. Overlapping ESTs 
were assembled into 36,354 B. terrestris putative transcripts, and functionally annotated. A preliminary 
assessment of differences in gene expression across non-replicated specimens from the pre-adult stages, castes 
and sexes was performed using R-STAT analysis. Individual samples from the life cycle stages of the bumblebee 
differed in the expression of a wide array of genes, including genes involved in amino acid storage, metabolism, 
immunity and olfaction. 

Conclusions: Detailed analyses of immune and olfaction gene expression across phenotypes demonstrated how 
transcriptomic analyses can inform our understanding of processes central to the biology of B. terrestris and the 
social Hymenoptera in general. For example, examination of immunity-related genes identified high 
conservation of important immunity pathway components across individual specimens from the life cycle stages 
while olfactory-related genes exhibited differential expression with a wider repertoire of gene expression within 
adults, especially sexuals, in comparison to immature stages. As there is an absence of replication across the 
samples, the results of this study are preliminary but provide a number of candidate genes which may be 
related to distinct phenotypic stage expression. This comprehensive transcriptome catalogue will provide an 
important gene discovery resource for directed programmes in ecology, evolution and conservation of a key 
pollinator. 
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Background 

A major problem in biology is understanding phenotypic 
plasticity. Phenotypic plasticity, the ability of a single 
genotype to express alternative forms of morphology, 
physiology and behaviour in response to environmental 
conditions [1], provides the opportunity to study the 
influences of environment on the genome of an indivi- 
dual or group. Within the natural world, phenotypic 
plasticity is widespread and has been key to speciation 
and evolution [1-3]. However, within eusocial species it 
has resulted in polyphenism, where multiple distinct 
adult phenotypes result from differential expression of a 
single genome [4]. Revealing how multiple sets of genes 
are differentially expressed within the distinct pheno- 
types of eusocial species offers an unprecedented oppor- 
tunity to understand the molecular mechanisms related 
to polyphenisms. 

The eusocial Hymenoptera (ants, some bees and 
wasps) present an excellent system in which to study 
how gene expression relates to numerous types of poly- 
phenism. Firstly, the social Hymenoptera undergo com- 
plete metamorphosis, or holometabolous development, 
where four morphologically distinct developmental 
stages (egg, larva, pupa and adult) exist. Holometabolous 
development is widespread in the superorder Endoptery- 
gota and the success of the life history trait is reflected 
in the high rates of diversification of species that 
undergo complete metamorphosis [5]. Secondly, they 
possess caste differentiation within the female sex, 
where a clear division of labour is evident between two 
or more physiologically, behaviourally and, in many 
cases, morphologically distinct phenotypes [6,7]. The 
reproductive duties of the colony are dominated by a 
queen while the majority of individuals serve as func- 
tionally sterile workers that perform altruistic tasks such 
as larval feeding, resource foraging, nest maintenance 
and colony defence [8]. Third, social Hymenoptera dis- 
play haplodiploid sex determination, with females pro- 
duced from diploid eggs, while haploid eggs develop 
into males [9]. 

Transcriptomic studies have the potential to unveil 
key gene expression differences that govern central bio- 
logical processes, such as immunity and olfaction, within 
and across life cycle stages. A number of studies have 
been performed in social Hymenoptera to determine 
which genes are differentially expressed across the adult 
castes [10-16] and across the sexes [17-19]. These ana- 
lyses addressed only subsets of the complete transcrip- 
tomes of the target species. Recent advances in 
sequencing technology, such as Roche 454 and lUumina 
sequencing platforms, generate vastly higher volumes of 
data [20,21]. As a consequence, these technologies have 
been used in several studies of non-model species, such 



as the Granville butterfly [22], the Propertius duskywing 
and the Anise swallowtail butterflies [23], the migratory 
locust Locusta migratoria [24] and the primitively euso- 
cial wasp, Polistes metricus [25]. These studies demon- 
strate the potential of transcriptomics to provide 
insights into the expression of polyphenisms in insects, 
including eusocial species. Thus, the utilisation of tran- 
scriptomic tools can significantly improve our knowl- 
edge of what genes influence the evolution of 
polyphenism within eusocial insect species. 

The buff-tailed bumblebee Bombus terrestris, which is 
common across Eurasia, is an important ecological polli- 
nator of a wide variety of crops [26,27]. The success of 
B, terrestris as a pollinator is reflected in its increased 
utilisation in commercial agriculture, a multi-million 
dollar industry [28]. Caste differentiation in the female 
sex is evident in B. terrestris with a single monandrous 
queen responsible for the main reproductive duties of 
the colony while functionally sterile workers perform 
altruistic tasks. As in all social Hymenoptera, sex deter- 
mination is haplodiploid. B, terrestris has an annual life 
cycle, where queens overwinter for several months, post- 
mating. Post-hibernation, the overwintered queen estab- 
lishes a colony in early Spring. She constructs the initial 
nest, and lays eggs. These hatch after four days into lar- 
vae that are completely dependent for feeding from the 
queen or sister workers for 10 to 14 days [29]. After 
spinning a cocoon or pupal case, the larva pupates for 
approximately fourteen days and metamorphoses into 
the adult [26]. Bombus workers display a range of sizes, 
with size being correlated to function, such as larger 
bees functioning as foragers [30]. Worker-destined 
diploid eggs continue to be laid by the queen until a 
transition point occurs during the colony life cycle, 
known as the competition phase, where the workers 
begin to develop functional ovaries and compete with 
the queen for reproductive output. Workers lay unferti- 
lised male-destined haploid eggs. The initiation of the 
competition phase coincides with haploid egg laying by 
the queen while diploid eggs present in the nest may 
develop into gynes, or virgin queens. Sexuals leave the 
colony soon after emergence. Virgin queens are mated 
and subsequently, locate a suitable site to diapause for 
the duration of the winter months [26]. 

Two previous transcriptomic studies for B, terrestris 
have been performed. Sadd et al. [31] generated 29,428 
expressed sequence tags (ESTs) from the thorax and 
abdomen of four pooled workers (2 were seven days 
post-emergence while the second 2 were aged fourteen 
days post-emergence) using Sanger sequencing technol- 
ogy. More recently, Woodard et al. [32] generated 454 
reads from the brains and abdomens of over 50 workers, 
to include in an analysis of genes central to convergent 
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evolution of eusociality within the bees. While both stu- 
dies have provided valuable resources for the study of 
the genomics of B. terrestris, they focused on gene 
expression within workers and as genes may be differen- 
tially regulated throughout development, caste or sex, a 
study incorporating multiple life cycle stages is required 
to increase our knowledge of gene expression in this 
important pollinator. 

Here we present a deep-sequencing Roche 454 tran- 
scriptome study of the pre-adult stages, adult castes, and 
sexes of B, terrestris, based on individual specimens. We 
identify potentially differentially expressed transcripts 
related to polyphenism and use differential expression 
to explore hypotheses for the involvement of two impor- 
tant biological processes, immunity and olfaction, in the 
life cycle of the bee. These processes were chosen 
because they represent generally important aspects of 
animal biology, they are expected to exhibit specific pat- 
terns of differential expression in social insects, and they 
can act as exemplars for the power of the transcriptomic 
approach. Immune defence against foreign agents would 
be expected to be heightened in life cycle stages that are 
more prone to infection, such as larvae, which are 
housed in a homeostatic nest environment with a high 
density of nest-mates and consumable resources [33], 
compared to those that are more protected, such as 
pupae, which are enclosed in a sealed cocoon. In adults, 
workers, which have increased exposure to the environ- 
ment through foraging and increased contact with 
potentially infected individuals in the colony, and gynes, 
which need to survive mating and hibernation, would 
require a heightened immune response to increase long- 
evity compared to the non-social short-lived male 
[34-36]. Olfaction is a key aspect of animal biology, and 
in Bombus is particularly important for nest-mate recog- 
nition and communication [37], resource discrimination 
[38], subordinate control by the queen [39] and also 
mate selection [40]. We thus predict that specific olfac- 
tion-related genes would be upregulated in the adult 
stages, for purposes of resource discrimination and mate 
recognition, while distinct repertoires will be active in 
pre-adult stages, for caste-development pheromone 
reception. 

Results 

Sequencing, assembly and assembly validation 

Six cDNA preparations were sequenced using the Roche 
454 Life Sciences GS FLX Titanium Series sequencer, 
generating a total of 1,610,742 sequences. After removal 
of adapters and poly(A) tails, and trimming for base 
quality, there were 1,560,873 high quality sequences 
with an average length of 323 bases. Using the method 
outlined by Kumar and Blaxter [41], these were 
assembled using two different assemblers, Roche 454's 



gsAssembler ("Newbler") and MIRA, to generate the 
first-order assemblies that we will refer to as New- 
bler_454 and MIRA_454. There were 38,212 contigs in 
Newbler_454 with a mean length of 650 bases, while 
MIRA_454 contained 65,786 with a mean length of 668 
bases (detailed assembly information is provided in 
Additional file 1). Using PartiGene, 29,428 B, terrestris 
Sanger ESTs generated by Sadd et al. [31] and 234 B. 
terrestris mRNA sequences (obtained from EMBL 
nucleotide database) were clustered into 12,337 contigs 
(hereafter referred to as PG_Sanger). The three first 
order assemblies (PG_Sanger, Newbler_454 and 
MIRA_454) were coassembled using CAP3 to generate 
the BT_transcriptome_vl contig set. This contained 
22,318 contigs with contributions from more than one 
first order contig, of which 4,867 had contributions 
from all three first order assemblies, and 33,819 addi- 
tional contigs containing single first order contigs (Fig- 
ure 1). As expected, the improved data sampling in 
terms of depth, life cycle stage, caste and sex resulted in 
there being an excess (14,675 contigs) of CAP3 contigs 
with contribution from only Newbler_454 and 
MIRA_454. We regard those contigs with independent 
evidence from Sanger and 454 data, and presence in all 
three primary assemblies as being highly credible. CAP3 
contigs with contributions from two or only one pri- 
mary assembly are likely to be of lower credibility. 
We noted that there was still some residual redundancy 
in BT_transcriptome_vl contig set, and thus reclustered 
using the 56,137 second order contigs with PartiGene 
(removing 1,249 contigs with length of 100 bases or 
less). This dataset, BT_transcriptome_v2, contained 
36,354 contigs. Of these, 25,886 were unchanged from 
BT_transcriptome_vl, while the remaining 29,210 
BT_transcriptome_vl contigs were clustered into 10,468 
BT_transcriptome_v2 contigs. BT_transcriptome_v2 
contigs have an average length of 1,102 bases (minimum 
length = 100 bases, maximum length = 26,105 bases) 
with an N50 contig size of 1,533 bases. The total span 
of the BT_transcriptome_v2 contigs is 40MB. Full 
details of the assembly are provided in Additional file 1. 

PG_Sanger contigs that did not coassemble with the 
454 primary assemblies were examined for biological 
credibility. Approximately 75% (22,142) of the B. terres- 
tris Sanger sequences were included in contigs with 454 
data while in comparison, 862,818 (55% of total high 
quality screened reads) 454 reads mapped to the 
PG_Sanger contigs. The remaining 7,519 Sanger 
sequences clustered into 5,266 contigs, of which 4,091 
(77%) were singletons. Thus we conclude that the San- 
ger-sequencing derived contigs that lack 454 data sup- 
port are more likely to be rarely expressed genes or 
technical artefacts. However, AT content was approxi- 
mately equal for both sets (63.8% for Sanger-only 
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Figure 1 Bombus terrestris transcriptome_v1 assembly, (a) Venn diagram depicting tine contig contribution to CAPS second order contigs 
from tine tliree primary assemblers, MIRA_454 (green), Newbler_454 (red) and PG_Sanger (blue) contigs, consisting of previously available Sanger 
sequenced ESTs and mRNA available on GenBank. Venn diagram displays overlaps between the first order contig assemblers in CAPS second 
order assembly. Numbers in bold in the figure correspond to the number of contigs assembled or unassembled from primary assemblies by 
CAPS, (b) Map of regions of Venn diagrams depicting range in contig credibility: 1° contigs are highly credible, consisting of contributions from 
all three primary assemblies; 2° contigs are less credible contigs, consisting of contributions from only two primary assemblies; S° contigs consist 
of CAPS contigs assembled from only one primary assembly set; 4° contigs are unassembled contigs from the primary assembly and are 
considered poorly credible. 



compared to 64% for those that coassembled with 454), 
suggesting that these isolated contigs do not derive from 
genomic contamination. In terms of gene content, 43% 
of the 5,266 Sanger-only contigs had significant 
(BLASTx E-value < le-6) matches to the nonredundant 
protein database, again suggesting that many have 
recognisable coding potential and are less likely to be 
artefacts. Full information on testing of credibility is 
provided in Additional file 2. 

The B. terrestris genome project has released a first 
draft assembly and 95.9% of the BT_transcriptome_v2 
contigs (n = 34,861) had high significance megablast 
matches (E-value < le-65) to these data. Woodard et al. 
[32] generated an assembly of 19,485 B, terrestris contigs 
from Roche 454 data derived from sampling the brains 
and abdomens of over 50 bees, which were obtained 
from the TSA database on NCBI. Nearly 75% of these 
contigs (14,576) matched 9,003 contigs within BT_tran- 
scriptome_v2. The contigs unique to the study of Woo- 
dard et al. [32] may derive from rare transcripts, or 



from differences in origin of samples as Woodard et al. 
[32] generated transcripts from only brains and abdo- 
mens of workers while we used whole bodies of several 
stages. 

Annotation of BT_transcriptome_v2 

We compared BT_transcriptome_v2 to the NCBI nr 
protein database and found significant matches 
(BLASTx, with E-value cut-off of le-06) for 57.8% (n = 
21,028) contigs (Figure 2). Of these contigs, 14,268 were 
assembled from more than one first order assembly. 
Based on the fact that these contigs were independently 
assembled using two different algorithms, we may have 
greater confidence in the biological validity of these 
14,268 contigs. The majority of top-scoring matches 
(20,958 or 57.6%) were to Hymenoptera, including Apis 
species (36.4% of all contigs), the ant species Harpeg- 
nathos saltator (5.7%), Camponotus floridanus (5.5%) 
and Solenopsis invicta (3.4%), the parasitoid wasp Naso- 
nia species (1.7%) and Bombus species (1.1%) (the 
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Figure 2 Pie-chart displaying the species distribution for top BLAST match for BT_transcriptome_v2 contig set against NCBI nr 
database. BLASTx, E-value cut-off of le-06, against NCBI nonredundant database generated putative matclies for 21,028 B. terrestris sequences. 
Species generating tine most putative matclies were Hymenopteran insects: 13,251 [Apis species); 2,066 (C. floridonus); 1,996 (H. soltotor); 1,219 (5. 
invicto); 608 {Nosonia species); and 410 {Bombus species). Others account for 1,478 putative BLAST matclies for BT_transcriptome_v2 contigs. 



proportional representation reflecting the available pro- 
tein sequence data). We compared the transcriptome 
assembly to the whole-genome derived proteomes of a 
set of model Hymenoptera and other insects. Of the 

36.354 BT_transcriptome_v2 contigs, 19,744 (54.3%) 
matched a predicted protein from the honeybee Apis 
mellifera genome, 19,640 (54.0%) matched predicted 
proteins in the ant C. floridanus, and 19,762 (54.4%) 
matched predicted proteins in the ant H. saltator. There 
were 14,258 (39.2%) matches to the Drosophila melano- 
gaster transcriptome. To generate a gene estimate for 
our BT_transcriptome_v2 contig set, we examined the 
amount of best BLAST matches (tBLASTx cut-off of le- 
10) between our transcriptome set and the OGS of the 
honeybee, A. mellifera. In total, the BT_transccripto- 
me_v2 contig set matched 9, 217 unique predicted pro- 
teins from the latest Apis genome OGS suggesting the 
potential for an equal number of homologous protein- 
encoding genes within our transcriptome set. Contigs 
that had no match to a previously predicted protein 
may derive from non-coding RNAs, untranslated regions 
of mRNAs, or from protein coding genes highly 
diverged in or novel to B. terrestris. 

Functional annotation classification using the GO, EC 
and KEGG ontologies using annotSr resulted in the 
assignment of 533,897 GO terms, 14,345 EC terms and 

47.355 KEGG terms to BT_transcriptome_v2 contigs. 
Approximately 39% (n = 13,996) of contigs were anno- 
tated with GO terms. While many other contigs had sig- 
nificant BLAST matches in protein databases, these 
were to genes that have no GO annotation. GO-Slim 
analyses are provided in Additional file 3. InterProScan 



searches were performed generating predicted protein 
signatures for 24,998 BT_transcriptome_v2 contigs, of 
which 10,358 contigs received InterProScan (IPR) anno- 
tations with information regarding protein family, occur- 
rences of functional domains and repeats. 14,640 contigs 
had no IPR annotation but were annotated with func- 
tional domains. 

Conservation of major biological processes and pathways 
across Insecta 

To assess the completeness of the B. terrestris transcrip- 
tome represented in our data, we investigated the pre- 
sence in BT_transcriptome_v2 of genes in conserved 
developmental and physiological pathways across a set 
of sequenced insect genomes. For each pathway or sys- 
tem, a set of canonical genes was collated from the pro- 
teomes of four species (the eusocial hymenopteran. A, 
mellifera, the solitary hymenopteran, Nasonia vitripen- 
nis, the holometabolous dipteran, D. melanogaster, and 
the hemimetabolous hemipteran, Acyrthosiphon pisum; 
see Additional file 4) and these were compared to 
BT_transcriptome_v2 (Table 1). For each species, 
between 98.5% and 96.7% of the surveyed genes had 
matches in BT_transcriptome_v2, suggesting that deep 
sequencing with 454 technology has indeed yielded 
comprehensive insight into the core regulatory machin- 
ery in B. terrestris. 

Life cycle differential expression in B. terrestris 

We estimated the expression level of each contig by 
counting 454 reads mapped back to the BT_transcripto- 
me_v2 dataset. The majority (1,441,743 ESTs or 92.4%) 
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Table 1 Conservation of biological processes and pathways across Insecta. 



Biological Processes and Pathways 



A. mellifera 



N. vitripennis 



D. melanogaster 



A. pisum 



Development 

Dorsoventral Axis Formation 
Progesterone mediated oocyte maturation 
Developmental Signalling Pathways 

Decapentaplegic 

Hedgeiiog 

Notcli 

Wnt 

Metabolism 

Metabolic Pathways 
Physiology 

Circadian rliythm 

JAK-STAT 

MAPK 

mTOR 

Neuroactive-ligand binding receptor 

Pliototransduction 

Proteasome 

Defence 

Endocytosis 
Lysosome 

Natural Killer Cell Cytotoxicity 
Phagosome 

Regulation of Autophagy 
Total 

(proportion) 



21 (22) 

50 (50) 

21 (22) 
23 (25) 
16(17) 
62 (63) 

526 (532) 

6(7) 
13 (14) 
9 (11) 
23 (23) 
27 (29) 
26 (28) 
35 (35) 

69 (69) 

51 (52) 
19 (19) 
38 (39) 
12 (12) 

1047 
97.9% 



10 (10) 
41 (41) 

19 (19) 
24 (25) 
18 (18) 
60 (60) 

497 (508) 

7(7) 

11 (12) 
6(7) 

20 (20) 
15 (15) 
24 (24) 
35 (35) 

66 (66) 
53 (53) 
13 (13) 
50 (51) 
11 (11) 
980 
98.5% 



63 (65) 
59 (59) 

33 (34) 
36 (37) 
31 (32) 
118 (119) 

1118 (1169) 

31 (36) 

25 (28) 

42 (43) 

43 (43) 
45 (46) 
50 (50) 

139 (139) 
113 (116) 
36 (36) 
99 (100) 
18 (18) 
2099 
96.7% 



17 (17) 
37 (39) 

23 (24) 
15 (15) 

18 (19) 
49 (50) 

528 (533) 

8(8) 
17 (17) 
15 (17) 
25 (25) 
12 (12) 
17 (17) 
45 (45) 

75 (75) 
55 (59) 
22 (22) 
44 (44) 
14 (14) 
1036 
98.5% 



* Pathway components unavailable 

Biological processes and pathways with the number of pathway proteins with significant matches to BT_transcrlptome_v2 contigs In the honeybee, A mellifera, 
the jewel wasp, N. vitripennis, the frultfly, D. melanogaster, and the pea aphid, A. pisum. The number of reference proteins are given In brackets. 



were mapped uniquely to 33,875 contigs. 7,635 contigs 
(22%) received ESTs from all life cycle stages, and these 
universally expressed genes represented a majority (a 
mean of 74.5%) of the ESTs from each life cycle stage. 
There were 1,678 (4.9%) singletons (contigs with a single 
EST mapping). For 6,341 of the 33,875 contigs expres- 
sion was detected in only one life cycle stage (Figure 3). 
The larval and pupal stages had the highest proportion 
of stage-restricted contigs (-4% each). The adult stages 
had few stage-restricted contigs (workers: 3,865 [0.75% 
of total worker reads]; male: 4,182 [1.62% of total]; gyne: 
1,449 [0.65% of total]). The two worker samples were 
very similar, with 95% (worker 1) or 97% (worker 2) of 
reads mapping to 13,611 shared contigs. Further infor- 
mation regarding the number of unique contigs is pro- 
vided in Additional file 5. 

R-STAT analysis of differential gene expression amongst 
life cycle stages and castes 

Although our life cycle stage samples are non-replicated, 
with the exception of the workers (N = 2), to provide a 
preliminary assessment of potential differential 



expression across the life cycle stages of the bee, we 
employed the R-STAT method devised by Stekel et al. 
[42]. As we did not replicate across life cycle stages, 
with the exception of the workers, we do not have a full 
understanding of age or individual variation within the 
differential expression analysis but operating under the 
assumption that variation would be minimal between 
life cycle stages, we examined the following. In total, 
2,185 BT_transcriptome_v2 contigs were identified as 
having high R-values (R-value > 20; following the meth- 
ods by Stekel et al. [42]). The thirty contigs that had the 
highest R-values amongst all life cycle stages are detailed 
in Table 2. The ten genes that best distinguish the dif- 
ferent life-stages are provided in Additional file 6. 

Gene expression within the workers 

The contig with the highest R-value (BTT05460_1; R- 
value = 10,101) was identified as alpha-glucosidase, with 
elevated expression in both workers (Table 2; Additional 
file 6, Tables S6c and S6d). The workers also had ele- 
vated expression of genes involved in flight (sallimus; 
BTT20899_3); defence (bombolitin; BTT20391_1); 
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Figure 3 Contigs unique to B. terrestris life cycle stages within the BT_transcriptome_v2 contig set. Column chart displaying the number 
of singletons (sequences consisting of 1 454 read) and contigs containing more than 1 454 read generated uniquely from reads contributed 
from each of the B. terrestris libraries within the BT_transcriptome_v2 contig set. The proportion of the stage restricted contigs is provided as a 
percentage of the total BT_transcriptome_v2 contig set. 



metabolism (two contigs, BTT05294_1 and 
BTT33135_1, matching cytochrome P450 proteins); and 
a protein with domain matches to both haemolymph 
juvenile hormone binding and mite allergen group-7 
(BTT05272_1). 

Examination of the ten genes that best distinguish the 
worker caste (based on genes highly expressed in both 
worker samples) identified genes involved in metabo- 
lism, chitin binding and defence (see Additional file 6: 
Tables S6c and S6d)). Both workers exhibited high 
expression of genes involved in metabolism, such as oxi- 
dation of organic compounds (three contigs matching 
three cytochrome P450 proteins (BTT22253_1, 
BTT24074_1 and BTT22199_1)), and glucose break- 
down, (glyceraldehyde-3-phosphate dehydrogenase 
(BTT00029_1) and glucose oxidase (BTT20590_1)). A 
lipase (BTT15820_1), involved in the breakdown of 
lipids essential for periods of high activity, was also 
highly expressed in the workers. Two contigs 
(BTT05313_2 and BTT35235_1) matched chitin-bind- 
ing, peritrophin-like proteins. Two worker-upregulated 
contigs matched genes implicated in immune defence: 
allergen-related G12 protein (BTT05276_1) and bombo- 
litin (BTT05263_2). Other contigs highly differentially 
expressed in workers included diapause-related protein 
41 (BTT05480_1; physiological function unknown), and 



two contigs matching hypothetical proteins from the ant 
C. floridanus (BTT20743_1) and the honeybee 
(BTT05577_1). BTT05480_1 and BTT20743_1 have 
both transmembrane and signal peptide domains sug- 
gesting a cell-surface or membrane-bound location. 

Gene expression in the gyne 

The top ten differentially expressed genes included ele- 
vated expression in the gyne of an amino acid storage 
protein, hexamerin (BTT05275_2) (see Table 2). Within 
the ten genes that best distinguish the gyne (see Addi- 
tional file 6: Table S6f), five contigs matched storage 
proteins: four hexamerins (BTT05275_2, BTT36615_1, 
BTT05275_1 and BTT05277_1) and one arylphorin 
(BTT05260_1). Apart from genes involved in storage, 
one contig (BTT00028_1) matched acyl-CoA delta-9 
desaturase, a protein involved in fatty acid biosynthesis, 
another a chitin-binding peritrophin (BTT05289_2) and 
a third crooked, a protein involved in maintenance of 
tracheal tube structure (BTT18720_1). Two contigs had 
matches to hypothetical proteins in the ant Brachymyr- 
mex patagonicus (BTT07422_1) and the ant C. florida- 
nus (BTT20597_1). BTT07422_1 was annotated with a 
predicted signal peptide domain while BTT20597_1 had 
predicted transmembrane, signal peptide and fibronec- 
tin-like 1 domains. 
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Table 2 R-STAT analysis for the life cycle stages of B. terrestris. 



BTT_contig Functional 
description 



Best BLAST Match and GenBank 
Accession [species] 



Read counts per contig 



Larva^ Pupa*" Workerl^ Worker2'* Male^ Gyne^ Total^ 



R- 
value 



B™5460_1 
B™5364_1 
B™3121_2 

B™5275_2 
B™5433_2 

B™5434_1 

B™5442_1 

B™5434_2 

BTO5255_1 
BTO0945_1 
B^17751_l 

B™5366_1 

BTO3482_1 
BTO5822_1 
B^18911_l 

B™7410_1 

B™5437_1 

BTO0899_3 

BTO0391_1 
BTO1092_1 

B™5276_2 

BTO8955_1 

BTO4170_1 

B™5294_1 

BTO0843_1 

BTO2853_1 
BTO4170_2 



Alpha- 
glucosidase 

Allergen-related 

Endocuticle 

structural 

glycoprotein 

Hexamerin 

Hexamerin 

Peregrin-like 

Hexamerin 

Beta- 

ureidopropionase 

Hexamerin 

Hexamerin 

Short-chain 

dehydrogenase/ 

Reductase 

Endocuticle 

structural 

glycoprotein 

Hexamerin 

Hexamerin 

Hexamerin 

Vitellogenin 

Hexamerin 

Sallimus 



Bomboilitin 

Alpha- 
glucosidase 

Allergen-related 
Beta- 

ureidopropionase 

Hymenoptaecin 

Cytochrome 
P450 

Larval cuticle 
protein 

Hexamerin 

Hymenoptaecin 



Alpha-glucosidase [B. 
ignitus] 

PREDICTED: similar to 
CG4409-PA [A. mellifem] 

Endocuticle structural 
glycoprotein SgAbd-1 [C 
floridanus] 

Hexamerin [A. mellifera] 

Hexamerin 110 [A 
mellifera] 

PREDICTED: similar to 
CG1845-PA [A. mellifera] 

Hexamerin 110 [A 
mellifera] 

PREDICTED: similar to 
beta-ureidopropionase 
isoform 1 [A. mellifera] 

Hexamerin 110 [A 
mellifera] 

Hexamerin 110 [A 
mellifera] 

Short-chain 

dehydrogenase/reductase 
[A. mellifera] 

PREDICTED: similar to 
CG7658-PA [A. mellifera] 

Hexamerin [A. mellifera] 

Hexamerin [A. mellifera] 

Hexamerin 70b [A 
mellifera] 

PREDICTED: hypothetical 
protein [A. mellifera] 

Hexamerin 70c [A 
mellifera] 

PREDICTED: similar to 
sallimus CG1915-PC, 
isoform C [A. mellifera] 

Bombolitin [B. ignitus] 

Alpha-glucosidase [B. 
ignitus] 

Allergen-related G12 [H. 
saltator] 

PREDICTED: similar to 
beta-ureidopropionase 
isoform 1 [A. mellifera] 

Hymenoptaecin [B. 
ignitus] 

Cytochrome P450 4gl5 
[H. saltator] 

PREDICTED: similar to 
CG30045-PA [A. mellifera] 

Hexamerin [A. mellifera] 

Hymenoptaecin [B. 
ignitus] 



BAI44030.1 
XP_001 123230.1 
EFN60841.1 

ABR45904.1 
ABU92559.1 

XP_395348.2 

NP_001 094493.1 

XP_392773.2 

ABU92559.1 
BAI82215.1 
NP_001 01 1620.1 



ABR45905.1 
ABR45905.1 
NP_001 01 1600.1 

XP_001 121 939.1 

NP_001 0921 87.1 

XP_001 121 572.1 

ACY09649.1 
BAI44030.1 

EFN86980.1 

XP_392773.2 

ACA04899.1 

EFN85148.1 

XP_001 120541.1 

ABR45905.1 
ACA04899.1 



110 
5518 
5208 

41 
283 

34 

119 

17 

346 
147 
0 



XP_001 120498.1 2153 



1 

0 

2947 
0 
1 

7 

5 
31 



1 

106 
1355 
1 

0 



6 
1 

3 

45 

3384 
2638 
2428 



2376 
2304 
1760 

0 

1633 
1585 
15 

1512 

1496 

0 

1 
1 

0 

1190 

22 
29 
0 

943 
15 



2677 
0 
0 

434 
0 

38 
0 

34 

0 
0 
0 



0 
0 
337 

0 

0 

1176 

1677 
766 

1948 

11 

0 

1568 

0 

0 
1 



6080 
0 
0 

16 
0 

21 

0 

20 

0 
1 

0 



0 
0 

1922 

1691 
1685 

819 

7 

0 

2115 

0 

0 
0 



27 
0 
0 

55 
0 

26 
1 

32 

0 
0 
0 



8903 
5520 



10101 
8977 



0 5211 



5475 

0 3667 

38 2795 

1 2549 
18 2610 

0 2722 

0 2452 

0 1760 

0 2153 



0 1 1636 

0 0 1585 

99 1 77 3663 

0 1 1513 

0 0 1497 

16 305 3426 



7619 
6628 

5011 

4834 

4805 

4551 
4547 
3708 



837 4220 
0 2486 



2267 458 5492 

13 10 1239 

1345 1 1369 

576 762 5156 

0 0 1355 

0 0 944 

1171 0 1187 



3423 
3337 
3289 

3177 

3142 

2960 

2944 
2822 

2658 

2316 

2227 
2201 
2187 
1971 
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Table 2 R-STAT analysis for the life cycle stages of B. terrestris, (Continued) 



BTO3135_1 


Cytochrome 
P450 


Cytochrome P450 4G11 
[A. mellifera] 


NP_001 035323.1 84 


10 


1990 


1472 


541 


827 


4924 


1944 


B^12993_l 


Carbonic 
anhydrase 


Hypothetical protein 
SINV_10521 [5. invicto] 


EFZl 8639.1 1732 


8 


134 


49 


252 


30 


2205 


1937 


B™5272_1 


Hypothetical 
protein 


PREDICTED: similar to 
CG3246-PA [A. mellifero] 


XP_395658.3 7 


41 


1792 


871 


1453 


1399 


5563 


1916 



a = Total number of larva ESTs (n = 286501); b = Total number of pupa ESTs (n = 162608); c = Total number of workerl ESTs (n = 318664); d = Total number of 
worker2 ESTs (n = 194969); e = Total number of male ESTs (n = 257318); f = Total number of gyne ESTs (n = 221683); g = Total number of ESTs (n = 1441743). 
Mapped read contributions from each sequenced life cycle stage were analyzed using R-STAT to identify contigs with potential differential expression. Read 
counts are provided for each life cycle stage with the corresponding R-value. BT_transcriptome_v2 contigs (BTT_contig) reference, functional description, best 
NCBI nonredundant (nr) BLAST match description and GenBank accession are provided for each contig. 



Gene expression in the male 

The male sample had a high expression of genes 
involved in immunity (see Table 2). The male contribu- 
ted a high proportion of ESTs to a contig similar to the 
allergen protein G12 (BTT05276_2) and also exhibited 
high expression of two contigs encoding antimicrobial 
peptides identified as hymenoptaecins (BTT24170_1 and 
BTT24170_2). Within the top ten genes that distin- 
guished the male (see Additional file 6, Table S6e) a 
third hymenoptaecin contig was identified 
(BTT36277_1). Immunity aside, the male had high 
expression of genes involved in metabolism (i.e. sentrin- 
like protease; BTT06274_2) and a serine carboxypepti- 
dase (BTT05501_1), flight (i.e. the muscle protein titin; 
BTT05775_1) and cuticle formation (i.e. the peritrophin 
like gene BTT05289_1). Two contigs of unknown func- 
tion were also overexpressed in the male. One had a 
predicted fibronectin-like domain (BTT00570_1) while 
the second contig had transmembrane and signal pep- 
tide domains (BTT09205_1). 

Gene expression in the larva 

The pre-adult stages accounted for 19 of the top thirty 
highest R-value annotated contigs (6 from the larva and 
13 from the pupa). The larva (Table 2) had elevated 
expression of proteins involved in cuticle formation 
(BTT03121_2, BTT05366_1, BTT20843_1 and 
BTT20746_1), and amino acid storage (hexamerin 70b 
BTT18911_1). A member of the cytochrome P450 
superfamily (BTT20966_1) was also highly expressed in 
the larva. Within the larva we identified elevated expres- 
sion of additional genes involved in development and 
cuticle formation (Additional file 6, Table S6a). The 
enzyme, carbonic anhydrase, which is involved in meta- 
bolism, was highly expressed (BTT12993_1). Two func- 
tionally unidentified contigs were also overexpressed, 
one of which (BTT35627_1) had predicted transmem- 
brane and signal peptide domains. 

Gene expression within the pupa 

In contrast to the high expression of cuticular and 
structural proteins in the larva, the pupa had a higher 



expression of a diversity of amino acid storage proteins 
(Table 2; Additional file 6, Table S6b). Of the amino 
acid groups, seven contigs matched hexamerins: four to 
hexamerin 110 (BTT05433_2, BTT05442_1, 
BTT35255_1 and BTT20945_1), one to hexamerin 70c 
(BTT05437_1) and two to an unclassified hexamerin 
(BTT23482_1 and BTT35822_1). There was a high 
pupal EST contribution to a contig matching a pere- 
grin-like protein, which has a role in dorsal/ventral axon 
guidance, an important biological process during trans- 
formation from larva to adult. Of the three contigs puta- 
tively matching metabolic enzymes, two contigs 
(BTT05434_2 and BTT38955_1) matched beta-ureido- 
propionase, an enzyme involved in metabolism of pyri- 
midine and beta-alanine, while one matched a short- 
chain dehydrogenase-reductase (BTT17751_1). Vitello- 
genin, which has a role in lipid transport, was highly 
expressed in the pupa (BTT07410_1) in comparison to 
the larva. 

Differential expression of immunity-related genes across 
the B. terrestris life cycle 

The development and expression of the immune response 
in B. terrestris is particularly interesting (see Introduction). 
We identified B, terrestris homologues of the four major 
A, mellifera immune signalling pathways {Toll, Immune 
deficiency (ImD), Janus kinase and signal transducer and 
activator of transcription (JAK-STAT), and JNK immune 
signalling) in BT_transcriptome_v2 contigs, and analysed 
these for expression differences among life cycle stages 
(Figure 4). Of the Toll pathway components, expression 
was uniform across life cycle stages apart from two signal- 
ling components. Tube and Pelle, Neither was expressed 
in the pupa, while Tube was only detected in the female 
adult stages (workers and gynes). For the ImD, JNK, and 
JAK-STAT signalling pathways, two components were not 
present in BT_transcriptome_v2: the transmembrane 
receptor Domeless, and the transcription factor for the 
JNK pathway, JNK MAP Kinase basket. Components of 
these three pathways were detectably expressed in all 
stages, but TEPA, DREDD, and TGF-p activated kinase 1 
(TAKl) were only expressed in workers. Sadd et al. [31] 
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JAK/STAT 
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Figure 4 The immune system in Bombus terrestris. The B. terrestris transcriptome data were screened for the presence of components of the 
major immune signalling pathways using reciprocal tBLASTx searches between BT_transcriptome_v2 and A mellifera sequences. Some 
components had stage limited expression, as indicated by the coloured circles. Pathway components also identified by Sadd et al. [31] are 
indicated by star symbols. 



also enumerated pathway component expression in work- 
ers, but we identified more complete coverage of expected 
pathways, including a complete Toll signalling pathway. 

Ten immunity-related effector genes were differentially 
expressed across the sequenced specimens (Table 3). 
Nine of these were expressed in all life cycle stages (the 
antimicrobial peptide (AMP) abaecin was absent from 
the larva). Phenoloxidase subunit A3 (BTT08527_1), 
which functions in the production of melanin and other 
polyphenic compounds for both cuticle biosynthesis and 
immune encapsulation, had highest expression in the 
pupa. A C-type lectin (BTT07416_1) and a gram-negative 
binding protein (BTT09196_1), which both have roles in 
immune detection of bacteria, had elevated expression in 
the larva, while the gram-positive binding peptidoglycan 
recognition protein (PGRP) SA was highly expressed in 
the gyne (BTT21344_1). Transferrin, an iron chelator 



that impacts on the survival of bacteria within a host, 
matched 22 contigs with differential expression, of which 
20 contigs had high EST contributions in the male while 
two contigs had high EST contribution from the gyne 
(BTT35862_1) and the larva (BTT35539_1), respectively 
(Table 3). The male had elevated expression of the AMPs 
abaecin, apidaecin, hymenoptaecin and defensin 1 (Table 
3: Anti-microbial peptides). Several distinct contigs 
encoding the BombuS'Speciiic AMP bombolitin 
(BTT20391_1, BTT34958_1, BTT35608_1 and 
BTT40712_1), a constituent of venom, showed elevated 
expression in the female adults, especially the workers. 

Differential expression of olfaction genes in B. terrestris 

We expected that olfaction would differ among castes 
and life cycle stages because of their differing needs for, 
and responses to, social and other cues. We examined 
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Table 3 Potential differential expression of immunity-related genes across the life cycle stages of B. terrestris, 

BTT_contig Best BLAST Match and GenBank Accession [species] Read counts per contig 



Immune-related Genes Larva^ Pupa'' Workerl Worker2 Male Gyne Total R- 













c 


d 


e 


f 


g 


value 


BTO8527_1 


Phenoloxidase subunit A3 [A. mellifero] 


NP_001 01 1627.1 


59 


78 


4 


1 


3 


6 


151 


123 


BTO4384_1 


Transferrin [A. mellifera] 


AA039761.1 


38 


58 


106 


68 


530 


152 


952 


395 


BTO5190_1 


Transferrin [A. mellifero] 


AA039761.1 


1 


0 


7 


0 


27 


4 


39 


25 


BTO5213_1 


Transferrin [A. mellifero] 


AA039761.1 


94 


83 


199 


96 


292 


238 


1002 


95 


BTO5261_1 


Transferrin [A. mellifero] 


AA039761.1 


12 


9 


86 


13 


148 


64 


332 


113 


BTO5441_1 


Transferrin [A. mellifero] 


AA039761.1 


7 


3 


13 


17 


49 


20 


109 


29 


BTO5539_1 


Transferrin [A. mellifero] 


AA039761.1 


34 


0 


15 


30 


4 


17 


100 


28 


BTO5592_1 


Transferrin [A. mellifero] 


AA039761.1 


9 


0 


13 


3 


74 


20 


119 


66 


D 1 1 O JO 1 D_ \ 


Transferrin [A. mellifero] 


AAUoy/D 1 . 1 


o 

y 


D 


OQ 
ZO 


1 D 


o4 


o4 


\ J/ 


OJ 


BTO5644_1 


Transferrin [A. mellifero] 


AA039761.1 


10 


14 


55 


9 


117 


53 


258 


82 


BTO5679_1 


Transferrin [A. mellifero] 


AA039761.1 


0 


0 


60 


3 


119 


37 


219 


134 


BTO5708_1 


Transferrin [A. mellifero] 


AA039761.1 


0 


0 


8 


0 


24 


3 


35 


23 


BTO5750_1 


Transferrin [A. mellifero] 


AA039761.1 


4 


5 


35 


6 


55 


27 


132 


39 


BTO5862_1 


Transferrin [A. mellifero] 


AA039761.1 


0 


0 


15 


0 


0 


150 


165 


236 


BTO5895_1 


Transferrin [A. mellifero] 


AA039761.1 


0 


0 


10 


0 


31 


10 


51 


31 


BTO5931_1 


Transferrin [A. mellifero] 


AA039761.1 


13 


0 


18 


1 


35 


17 


84 


24 


BTO5947_1 


Transferrin [A. mellifero] 


AA039761.1 


0 


6 


8 


1 


37 


8 


60 


33 


B™7416_1 


PREDICTED: similar to CG3244-PA, partial [A 
mellifero] 


XP_624536.1 


375 


9 


6 


4 


7 


38 


439 


463 


BTO1344_1 


Peptidoglycan-recognition protein SA [A 

mellifero] 


NP_001 1571 87.1 


56 


159 


116 


22 


72 


169 


594 


121 


D 1 1 uy 1 yo_ 1 


Gram-negative bacteria-binding protein 1-2 
[A. mellifero] 

Anti-microbial Peptides 


Nr_UU \ \ d/ \ oo. 1 


141 


1 0 


90 


57 


86 


26 


41 0 


56 


BTO9615_1 


Abaecin [A. mellifero] 


AAA67442.1 


0 


0 


0 


0 


24 


0 


24 


30 


B™8491_1 


Abaecin [A. mellifero] 


AAA67442.1 


0 


0 


0 


0 


45 


0 


45 


64 


BTO0802_1 


Abaecin [A. mellifero] 


AAA67442.1 


0 


5 


0 


0 


122 


6 


133 


174 


BTO7089_1 


Abaecin [A. mellifero] 


AAA67442.1 


0 


0 


2 


1 


57 


6 


66 


71 


BTO8306_1 


Abaecin [A. mellifero] 


AAA67442.1 


0 


0 


1 


1 


31 


0 


33 


38 


B™5666_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


3 


7 


1 


3 


55 


3 


72 


59 


BTO0828_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


9 


33 


7 


9 


128 


16 


202 


123 


BTO5749_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


2 


2 


14 


24 


8 


29 


79 


27 


BTO8204_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


0 


4 


1 


2 


25 


0 


32 


27 


B™i196_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


0 


11 


2 


3 


137 


7 


160 


180 


B™1985_1 


Apidaecin [A. mellifero] 


CAA5 1168.1 


4 


13 


3 


4 


81 


2 


107 


90 


BTO0391_1 


Bombolitin [B. Ignitus] 


ACY09649.1 


5 


1 


1677 


1691 


9 


837 


4220 


2944 


BTO4958_1 


Bombolitin [B. Ignitus] 


ACY09649.1 


1 


0 


49 


18 


1 


15 


84 


46 


BTO5608_1 


Bombolitin [B. Ignitus] 


ACY09649.1 


0 


0 


201 


98 


0 


180 


479 


317 


B™0712_1 


Bombolitin [B. Ignitus] 


ACY09649.1 


0 


0 


341 


148 


1 


133 


623 


422 


B^l 0405_1 


Defensin 1 [B. terrestris] 


ADB29129.1 


2 


2 


0 


0 


85 


0 


89 


122 


BTO1016_1 


Defensin 1 [B. terrestris] 


ADB29129.1 


0 


0 


0 


0 


179 


0 


179 


289 


B™1393_1 


Defensin 1 [B. terrestris] 


ADB29129.1 


3 


0 


3 


3 


25 


1 


35 


21 


B™2034_1 


Defensin 1 [B. terrestris] 


ADB29129.1 


0 


2 


3 


3 


47 


0 


55 


56 


BTO4170_1 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


1 


22 


0 


0 


1345 


1 


1369 


2226 


BTO41 70_2 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


15 


1 


0 


1171 


0 


1187 


1948 


BTO6277_1 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


8 


3 


0 


882 


0 


893 


1460 


BTO7570_1 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


0 


0 


0 


41 


0 


41 


57 


B™1217_1 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


0 


0 


0 


70 


0 


70 


105 


B^18071_l 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


1 


0 


0 


27 


0 


28 


34 


BTO5620_1 


Hymenoptaecin [A. mellifero] 


AAA67444.1 


0 


0 


40 


10 


13 


15 


78 


30 
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Table 3 Potential differential expression of immunity-related genes across the life cycle stages of B. terrestris, 

(Continued) 



BTO6443_1 


Hymenoptaecin [A. mellifera] 


AAA67444.1 


12 


34 


17 


10 


47 


38 


158 


29 


BTO7051_1 


Hymenoptaecin [A. mellifera] 


AAA67444.1 


0 


0 


0 


0 


183 


0 


183 


296 


B™i185_1 


Hymenoptaecin [A. mellifera] 


AAA67444.1 


0 


6 


0 


0 


64 


0 


70 


90 


B™2611_1 


Hymenoptaecin [A. mellifera] 


AAA67444.1 


0 


0 


0 


0 


20 


0 


20 


24 



a = Total number of larva ESTs (n = 286501); b = Total number of pupa ESTs (n = 162608); c = Total number of workerl ESTs (n = 318664); d = Total number of 
worker2 ESTs (n = 194969); e = Total number of male ESTs (n = 257318); f = Total number of gyne ESTs (n = 221683); g = Total number of ESTs (n = 1441743). 
immunity-related genes were identified in BT_transcriptome_v2 and are shown with read counts for each life cycle stage. The highest amount of ESTs per contig 
is highlighted in bold. 



the presence and differential expression of olfaction- 
related genes, namely the odorant-binding proteins 
(OBPs) and the chemosensory proteins (CSPs) (Table 4). 
Both larva and pupa had low expression of OBPs, with 
the exception of high expression of OBP13 in the pupa 
(Table 4 and Figure 5). In addition, OBP2 (BTT27418_1, 
BTT42039_1) OBPS (BTT35749_1) and OBP4 
(BTT41749_1), OBP13 (BTT26653_1) were expressed in 
the pupa while OBPS and OBP13 were only expressed in 
the larva (Figure 5). Four CSPs, namely CSP2 
(BTT00638_1), CSPS (BTT19102_1), CSP4 
(BTT17558_1, BTT26377_1) and CSP6 (BTT35682_1), 
were expressed in the larva with particularly high expres- 
sion of CSPS (BTT19102_1) and heightened expression 
of CSP4 in comparison to the pupa (BTT34298_1). The 
pupa only expressed three CSPs (CSP2, CSPS and CSP4). 



The adult stages had a heightened expression of OBP2 
and OBPS in comparison to the larval and pupal stages 
(Table 4). Contigs identified as OBP2 (BTT08587_1; 
BTT11797_1, BTT20806_1 and BTT40878_1) and OBPS 
(BTT0562S_1 and BTT20849_1) had elevated expression 
in the adult stages in comparison to the larva, and were 
not expressed in pupa. OBPlS-like proteins were highly 
expressed in the pupa (BTT20296_1) and the male 
(BTT16540_1; BTT2665S_1). In addition, OBPl and 
OBP6 were detected in all adult stages but in neither 
the larva nor the pupa. OBPll was only expressed in 
the sexuals (male and gyne) while OBP9 was only 
expressed in the male. Four CSPs expressed in the larva 
were expressed in the adult stages. There was an ele- 
vated expression of CSP4 in the female adults in com- 
parison to the male. 



Table 4 Potential differential expression of olfaction-related genes in the life cycle stages of B. terrestris. 



BTT_contig 


Best BLAST Match and GenBank Accession [species] 








Read counts per contig 








Odorant binding proteins 


Larva^ 


Pupa^ 


Workerl^ Worker2'' 


Male^ 


Gyne^ 


Total^ 


R-value 


B™8587_1 


OBP 2 [A. mellifera] 


NP_00101 1591.1 


0 


0 


129 


83 


97 


172 


481 


193 


B^l 1 797_1 


OBP 2 [A. mellifera] 


NP_00101 1591.1 


0 


0 


19 


14 


40 


11 


84 


35 


BTO0806_1 


OBP 2 [A. mellifera] 


NP_00101 1591.1 


1 


0 


63 


109 


86 


61 


320 


129 


B™0878_1 


OBP 2 [A. mellifera] 


NP_00101 1591.1 


0 


0 


11 


3 


19 


18 


51 


20 


B™5623_1 


OBP 3 [A. mellifera] 


NP_001 03531 1.1 


0 


0 


32 


11 


110 


77 


230 


134 


BTO0849_1 


OBP 3 [A. mellifera] 


NP_001035311.1 


0 


0 


56 


32 


46 


100 


234 


103 


B^16540_l 


OBP 13 [A mellifera] 


NP_00 10353 14.1 


0 


0 


0 


0 


31 


0 


31 


41 


BTO0296_1 


OBPl 3 [A mellifera] 


ABD92645.1 


93 


776 


116 


22 


22 


91 


1120 


1045 


BTO6653_1 


OBPl 3 [A mellifera] 


ABD92645.1 


0 


1 


0 


0 


47 


0 


48 


66 




Chemosensory Proteins 


















B™0638_1 


CSP 2 [A. mellifera] 


NP_001 071 278.1 


8 


30 


7 


6 


19 


5 


75 


23 


Bm9102_l 


CSP 3 precursor [A. mellifera] 


NP_001 01 1583.1 


467 


16 


56 


19 


64 


126 


748 


385 


B^l 7558_1 


CSP 4 [A. mellifera] 


NP_001 071 282.1 


34 


20 


53 


115 


66 


31 


319 


57 


BTO0849_1 


CSP 4 [A. mellifera] 


NP_001 071 282.1 


0 


0 


56 


32 


46 


100 


234 


103 


BTO6377_1 


CSP 4 [A. mellifera] 


NP_001 071 282.1 


15 


15 


68 


69 


67 


43 


277 


38 


BTO4298_1 


CSP 4 [A. mellifera] 


NP_001 071 282.1 


45 


0 


3 


3 


8 


4 


63 


39 


BTO9167_1 


CSP 4 [A. mellifera] 


NP_001 071 282.1 


11 


8 


52 


39 


37 


33 


180 


22 


BTO5682_1 


CSP 6 [A. mellifera] 


NP_001 071 287.1 


1 


0 


37 


16 


99 


25 


178 


92 



a = Total number of larva ESTs (n = 286501); b = Total number of pupa ESTs (n = 162608); c = Total number of workerl ESTs (n = 318664); d = Total number of 
worker2 ESTs (n = 194969); e = Total number of male ESTs (n = 257318); f = Total number of gyne ESTs (n = 221683); g = Total number of ESTs (n = 1441743). 
Potential odorant-blnding proteins and chemosensory proteins in the BT_transcriptome_v2 contigs are shown, with read counts for each of the life cycle stages. 
The highest amount of ESTs per contig are highlighted in bold. 
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Odorant Binding Proteins 


Chemosensory Proteins 




OBPl 


0BP2 


0BP3 


0BP4 


0BP6 


0BP9 


OBPll 


0BP13 


CSP2 




CSP4 


CSP6 


Larva 


























Pupa 














— 












Worker 1 


























Worker 2 


























Male 
















' — 











Gyne 





























Absent: No EST contribution to matching contigs 




Low expression: 1-100 ESTs contributed to matching contigs 




High expression: 101-200 ESTs contributed to matching contigs 




Very high expression: >200 ESTs contributed to matching contigs 



Figure 5 Olfaction in bumblebees. BT_transcriptome_v2 contigs were screened for the presence of olfaction-related genes, odorant-binding 

proteins (OBPs) and chemosensory proteins (CSPs). Some genes had stage-specific expression as indicated by the colour scheme. 
J 



Discussion 

We have used Roche 454 deep sequencing to define and 
compare the transcriptomes expressed by life cycle 
stages and castes of B. terrestris. Together, these data 
form a comprehensive gene catalogue for this ecologi- 
cally and economically important species. The de novo 
assembly had high contiguity, with a mean contig length 
of 1,102 bases. The G-hC content for the contigs was 
36%, which is similar to the worker-sequencing results 
of Sadd et al. [31]. The majority of contigs (approxi- 
mately 58%) had significant BLAST matches to pre- 
viously described proteins. The remaining 42% may 
derive from untranslated regions of unassembled 
mRNAs, noncoding RNAs or transcriptional noise 
(retained introns and similar), or, more interestingly, 
from B. terrestris genes that are either novel or have 
diverged significantly from any sequenced relatives. 
Comparison of the B, terrestris transcriptome against 
the emerging B. terrestris genome sequence identified 
matches for 95.9% of the contigs, and we expect that 
our data will be of utility in annotation efforts for this 
genome (which is currently being carried out by the 
Bombus sequencing consortium; K. Worley, P. Schmid- 
Hempel, G. Robinson, pers.comm.). Separate sequencing 
of individual life stages permitted the identification of 
potentially differentially expressed genes across the life 
cycle stages using R-STAT, and identified potentially 
important candidate genes underpinning stage- and 
caste-specific phenotypes. While we drew samples from 
two natal colonies, comparison of expression between 
the workers from the two colonies demonstrated signifi- 
cant similarity, justifying an overall comparison of pre- 
adult stages and adult stages. 

We used R-STAT analyses to identify a list of poten- 
tially differentially expressed contigs, in order to provide 
direction for future studies examining gene expression 
within or across life cycle stages of B. terrestris. 



However, some data already exist in support of this 
approach. In a previous study by Pereboom et al. [11], 
northern blot analyses examined expression differences 
between whole bodied larvae and adults (queen and 
worker stages) and identified three genes (endocuticle 
structural glycoprotein; hexamerin 70b; and 60S acidic 
ribosomal protein P2) that were more highly expressed 
in larvae than in adults, where expression was absent. 
These genes match contigs generated from our tran- 
scriptomic R-STAT analyses that show the same larva- 
adult expression pattern, providing independent support 
for the biological validity of the results discussed below. 

Differential gene expression linked to developmental 
processes 

We identified a number of key differences in genes 
implicated in developmental processes across the differ- 
ent life stages. For example, the larval data showed high 
expression of genes involved in cuticle biogenesis. As 
the larval stage represents a period of feeding and 
growth, undergoing four developmental moults over a 
fourteen day period, this upregulation of cuticular pro- 
teins and endocuticle structural glycoproteins is 
expected. Other contigs identified in the transcriptome 
with similar expression profiles are thus likely candi- 
dates for genes with similar stage-specific roles. For 
example, a short-chain dehydrogenase/reductase (SDR) 
had elevated expression in the pupa. SDRs function in 
hormone and steroid metabolism [43], and in social 
insects, may be involved in stage and caste differentia- 
tion, for example through binding juvenile hormone. An 
SDR was demonstrated to be differentially expressed at 
the mRNA level in the ovaries in developing larvae of 
honeybee workers and queens, and SDR gene expression 
was higher in the ovaries of worker larvae in compari- 
son to queen larvae resulting in possible inhibition of 
ovary development [44]. The likely co-orthologues 
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found in our study of the A. mellifera SDR analysed by 
Guidugli et al. [44] were derived solely from pupal 
reads, while Guidugli et al. [44] reported expression 
being highest in final stage honeybee larvae. Thus, we 
propose that these SDRs are candidate genes for further 
investigation into their roles in bumblebee caste 
determination. 

Amino-acid storage protein expression across the 
bumblebee life cycle 

Hexamerins are amino acid storage proteins, related to 
tyrosinases [45], used during non-feeding periods of 
adult development to provide amino acids and energy 
[46]. Hexamerins also have roles in the binding of juve- 
nile hormone during insect development, impacting on 
growth regulation within the larval stage [47-49]. A 
number of studies [50-53] have demonstrated differen- 
tial expression of hexamerin proteins (hexamerin 70a, 
hexamerin 70b, hexamerin 70c and hexamerin 110) at 
different stages of development amongst the sexes and 
castes in the honeybee. In B. terrestris, the pupal sample 
had elevated expression of hexamerin 110-like genes 
and a hexamerin 70 homologue, while the larva had ele- 
vated expression of distinct hexamerin 70-like genes, 
and the gyne elevated expression of four additional hex- 
amerin 70-like genes. Thus while only four hexamerin- 
encoding genes were identified in A, mellifera [53], we 
have identified potentially ten in B, terrestris that show 
differential expression between life cycle stages and 
castes, suggesting that these proteins may play complex 
roles in bumblebee development. 

Why would gynes express high levels of amino acid 
storage proteins? Honeybee virgin queens have a 
higher expression of hexamerins in comparison to 
workers, with the hexamerins functioning in gonad 
development [53], whilst in the wasp, P, metricus, 
developing gynes have higher quantities of Hex 1 than 
workers [54]. Studies on hexamerins in other social 
insects, such as ants and termites, have identified a 
correlation between depletion of hexamerins within 
queens and colony formation [55,56]. Therefore, this 
potentially high expression of hexamerins would be 
important for a B. terrestris gyne from a colony forma- 
tion viewpoint. However, as B, terrestris queens 
undergo a prolonged hibernation after mating, amino 
acid storage proteins may be important for maintaining 
functional operation of crucial biological processes 
during a period of intense stress. As hibernation is a 
key stage in the life cycle of bumblebees, many species 
of which are in drastic worldwide decline [57], our 
results provide direction for future work to analyse the 
mechanisms behind successful hibernation in these 
insects. 



Genes involved in adult behaviour and physiology 

Workers had elevated expression of enzymes, such as 
alpha-glucosidase and a muscle-specific lipase, that 
would be important for worker task completion. Alpha- 
glucosidase is involved in carbohydrate metabolism and 
utilised by foraging honeybees to metabolise nectar into 
fructose and glucose [58], while the muscle lipase is 
important for breaking down lipids during periods of 
high activity [59]. Interestingly, in our study both 
enzymes were expressed at an early stage in the adult 
workers' life (only 72 hours old), in contrast to their 
temporal pattern of expression in A, mellifera workers. 
Honeybee workers demonstrate a strict temporal poly- 
ethism, where younger workers perform nursing duties 
while older workers forage [60], and it is these foraging 
workers that exhibit higher expression of these enzymes. 
In comparison, there is no strong age-dependent divi- 
sion of labour within bumblebees [61]. Alloethism 
within bumblebee workers has been correlated with size, 
with studies identifying larger workers performing fora- 
ging tasks while smaller workers perform in-nest func- 
tions, although this division of labour can change 
depending on the requirements of the colony [30]. Thus 
future work might focus on size- and age-related differ- 
ences in gene expression between B. terrestris workers 
in relation to their subroles within the colony. 

Males are underrepresented in genomic studies in 
social insects as the emphasis has been almost exclu- 
sively on females. In the current study, the male had ele- 
vated expression of titin, a muscle protein, expressed in 
the insect flight muscles [62]. As the male bumblebee 
requires flight for foraging, patrolling and indeed mat- 
ing, high expression of flight-specific muscles would be 
required. In relation to behaviour, the male had a high 
expression of a neuroparsin, queen brain selective pro- 
tein 1, which has been suggested to function in caste 
determination during honeybee development through 
manipulation of the insect insulin-like pathway [63]. 
However, neuroparsins have been suggested to play 
roles in a wide variety of functions, including reproduc- 
tion [64]. Therefore, a neuroparsin-like protein in the 
bumblebee may have male-specific expression in relation 
to its behaviour or physiology. The male had elevated 
expression of several genes that matched hypothetical or 
otherwise unannotated proteins in the genomes of A. 
mellifera, C. floridanus and S. invicta. It is particularly 
interesting that the male received so many fully unanno- 
tated protein matches (n = 851 contigs). This may sug- 
gest possible novel expression associates with male- 
specific behaviour and/or physiology. Thus, these data 
offer valuable insights into the mechanistic basis of male 
biology in social insects, which has largely been ignored 
by previous studies (see references above). 
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The immune response in males 

Even in the absence of overt infectious challenge, the 
background level of immune defence is likely to be 
regulated through the bumblebee life cycle. We did not 
explicitly challenge the sampled bees with pathogens, 
but also did not keep them in germ-free environments 
(pre-adults in their natal colony, adults in nurseries), 
and thus we expect a background level of immune acti- 
vation. B. terrestris queens are monandrous, mating only 
once [65], while males can mate up to eight times [66]. 
In addition, B. terrestris exhibits highly male-biased sex 
ratios [67]. Together, this suggests high levels of compe- 
tition among the males to mate with gynes. Conse- 
quently, males should invest in reproductive fitness, 
which has been demonstrated in other insect species to 
trade-off against immunity (e.g. Anopheles gambiae 
[68]). We identified genes involved in pathogen recogni- 
tion, the transduction of recognition signals, and 
immune effectors, and analysed their patterns of expres- 
sion for data to support this hypothesis. Surprisingly, in 
contrast to our expectations based on life-history theory, 
the male had elevated expression (compared to other 
stages) of AMPs involved in the removal of infectious 
agents as part of the immune system [69-72], including 
hymenoptaecin, defensin, abaecin and apidaecin. This is 
the first account of an apidaecin-type protein being 
expressed in B, terrestris, Wilfert et al. [73] found no 
trade-offs between either branch of the immune system 
(prophenoloxidase (PPO) and AMP) and reproductive 
investment, but rather a positive correlation between 
AMPs and reproduction. Wilfert et al. [73] suggest the 
basis for the positive correlation may be because males 
pass on AMPs with their sperm to mates during copula- 
tion. However, the male in our study was very young 
and we sampled the whole body rather than just repro- 
ductive tissue, making the mating-gift hypothesis less 
convincing. Unlike the majority of social insect males, 
bumblebee males do not remain within the colony post- 
emergence from the pupal case. Bumblebee males forage 
for themselves and spend the majority of their time 
patrolling scent-marked trails [26]. Bumblebee males 
can survive outside the colony for up to 60 days 
(Brown, M.J.F., unpublished data). Thus males cannot 
take advantage of proposed colony-level social immunity 
[74] and a primed immune system might be an adapta- 
tion to life outside the colony. Thus, our results suggest 
that the life-history differences between males (effec- 
tively solitary) and females (colonial) may impose diver- 
gent selection on expression of immune system genes in 
social insects. 

Olfaction in the bumblebee 

Olfaction and the ability to discriminate a number of 
volatiles is of immense importance to insects in general. 



and social insects in particular. They must recognise 
nest-mates, discriminate and control subordinates, select 
mates, and discriminate between a wide range of plants 
for food collection. Here we discuss two classes of olfac- 
tion genes, the odorant binding proteins (OBPs) and the 
chemosensory proteins (CSPs). 

In the honeybee genome, 21 OBP genes have been 
identified and examined for patterns of expression [75]. 
Within our transcriptome dataset, we found significant 
matches for eight A, mellifera OBPs (OBPl, OBP2, 
OBP3, OBP4, OBP6, OBP9, OBPll and OBP13). This 
enables us to compare their expression in bumblebees 
to that of their homologues in honeybees. In the honey- 
bee, OBPl, OBP4, OBP6 and OBPll are expressed 
exclusively in the antennae of adults [75] and OBPll 
was identified as having gender-specific expression 
(absent from honeybee drones), suggesting a role in 
female recognition of mates [75]. We found low expres- 
sion of a B. terrestris orthologue of A. mellifera OBPl 
(formerly known as antennal specific protein 1), which 
is involved in binding of queen pheromone in honeybees 
[76]. In honeybees, OBPl is expressed in workers' and 
drones' antennae after approximately 14 days post-emer- 
gence [76]. As we sampled our bees before this late 
timepoint, level of expression of the OBPl homologue 
may simply be due to developmental staging. OBP6 was 
expressed in all the adult bumblebee stages, but not the 
larva and pupa, matching expression patterns in the 
honeybee. In contrast, OBPll was expressed in only the 
male and gyne. Consequently, OBPll in the bumblebee 
may have a role in mate recognition within both sexes, 
as opposed to the putative female-specific role in 
honeybees. 

OBP2 had elevated expression in all adult stages of B. 
terrestris. In honeybees, OBP2 is expressed specifically 
in the antenna with weak expression in the legs and 
head, possibly from chemosensory sensilla in these body 
parts [75]. In contrast, OBP3 is ubiquitously expressed 
in all adult body parts of the honeybee with the excep- 
tion of the antennae [75]. In B. terrestris, OBP3 was 
highly expressed in all the bumblebee adult stages, with 
higher expression in the gyne than in the worker, but 
was absent from the larva and pupa, which again 
matches honeybee expression patterns. OBP9 is only 
expressed in the ovaries and eggs of the queen in honey- 
bees [75] but in B. terrestris the male was the only sam- 
ple to show expression of OBP9. Lastly, OBP13 was 
highly expressed in the B, terrestris pupa and male, but 
Foret and Maleszka [75] identified expression of A. mel- 
lifera OBP13 in old larvae, with expression continued 
throughout the pupal stage. It appears that expression 
of these odorant binding proteins has been conserved in 
some cases, whilst diverging in others, presumably in 
response to taxon-specific selection processes. 
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We identified four putative CSPs (homologues of 
CSP2, CSPS, CSP4 and CSP6 in A, mellifera) in the B, 
terrestris transcriptome. B, terrestris CSPS and CSP4 had 
elevated expression in the larva, matching the results of 
Foret et al [77] who found A, mellifera CSPS to exhibit 
highest expression in the larva before pupal and imagi- 
nal moults. While Briand et al. [78] proposed that CSPS 
had a role in brood pheromone recognition, Foret et al. 
[77] proposed that CSPS may play a role in cuticle 
maturation. In A. mellifera, CSP4 expression was 
restricted to olfactory tissues in adult, but not pre-adult 
stages [77]. In contrast, while we detected CSP4 expres- 
sion in adult B, terrestris, the highest expression was in 
the larva. In honeybees, CSP2 was expressed at low 
levels throughout the life cycle stages while CSP6 was 
expressed throughout the larva, pupa and adult stages, 
with elevated tissue-specific expression in queen ovaries 
and eggs [77]. The expression of CSP2 and CSP6 across 
the bumblebee life stages is consistent with that of the 
honeybee. We were surprised not to have detected a 
homologue of A. mellifera CSPl, which is expressed ubi- 
quitously across honeybee life cycle stages. Finally, A, 
mellifera CSPS is expressed only by mature queen hon- 
eybees in eggs and ovaries, stages we did not sample in 
B, terrestris. 

Overall, the examination of olfaction-related genes in 
B. terrestris reveals both similarities and also striking 
differences from their expression in A. mellifera. We 
expect that these divergences reflect the different social 
structuring of bumblebee compared to honeybee colo- 
nies, and in particular the differing roles and strategies 
adopted by different castes in these two species. Again, 
our results provide indications of fruitful lines of future 
research into how patterns of gene expression relate to 
the evolution of primitive (bumblebees) and advanced 
(honeybees) sociality. 

Conclusions 

The role of differential gene expression in polyphenism 
within groups is only beginning to be understood, e.g. 
[4,11-15,79]. As a step towards understanding genes 
influential in the expression of specific phenotypes 
throughout an individual's life cycle stage, we performed 
a transcriptome-wide analysis of individual specimens of 
the pre-adult stages, castes and sexes of the bumblebee 
to improve our knowledge of differential gene expres- 
sion across the life history of this ecologically and agri- 
culturally important pollinator. We have identified sets 
of genes that are candidates for regulators and effectors 
of different phenotypes, and revealed the differing phy- 
siological states of each morph. These candidate genes 
will prove important for future genomic and proteomic 
studies on B, terrestris. 



The B. terrestris genome is being sequenced, and a 
BAC library and a number of genetic linkage maps have 
been constructed to provide greater genomic resources 
for this important insect [7S, 80-82]. Our study contri- 
butes to this global effort through 504 Mb of Roche 454 
transcriptomic data for B. terrestris. We provide a cen- 
tral, web-based transcriptomics resource for B. terrestris 
[8S], that facilitates user querying of sequences and 
functional annotations. These data are now a bridgehead 
into deeper, molecular analyses of the physiological, 
genetic and evolutionary bases of polyphenism, and the 
further development of the bumblebee as a model social 
insect. 

Methods 

Animals 

B. terrestris colonies (Koppert, Netherlands) were main- 
tained at 27 ± 1°C (45-50% RH) under red light. Pollen 
and sugar water (Apilnvert®) were supplied to the colo- 
nies ad libitum. Two colonies were chosen where the 
founding queen was present, only worker callows were 
emerging from hatching pupal cases and sexual off- 
spring, i.e., males and gynes, were absent. From one col- 
ony, individual specimens of worker life stages (larva, 
pupa and worker, known hereafter as worker 2) were 
collected. Adult stages (male, gyne and worker, known 
hereafter as worker 1) were similarly collected from the 
second colony. Workers were collected from both colo- 
nies to provide a comparison of gene expression across 
the colonies for the worker phenotype. In total, six indi- 
viduals (one worker larva, one worker pupa, one male, 
one gyne and two worker adults) were collected for 
transcriptomic analyses. The larva and pupa were main- 
tained in their natal colony and monitored daily using 
photography. At seven days of development, the larva 
(third instar) and pupa were removed, snap frozen in 
liquid nitrogen and stored at -80°C. Upon emergence, 
adults were removed to a nursery, which housed three 
distinguishable older workers, each with clipped wings, a 
brood clump, pollen and sugar water (providing both a 
normal social environment to stimulate normal gene 
expression in the newly emerged adults and a setting 
that was simple to monitor). The presence of older 
workers within the nursery would suppress ovarian 
development and subsequent egg laying within subordi- 
nate younger adults. Adults matured in the nursery for 
three days. Adults were then sacrificed by snap freezing 
in liquid nitrogen and stored at -80°C. 

RNA Extraction, cDNA synthesis and EST sequencing 

Total RNA was isolated from the whole bodies of speci- 
mens using TRIzol Reagent (Invitrogen, UK) according 
to the manufacturer's instructions. Each specimen was 
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ground using a high performance disperser (T-18 Basic 
ULTRA TURRAX, IKA®) in 5 ml of TRIzol reagent 1.5 
ml of chloroform (Sigma, Ireland) was added to the 
TRIzol extract, mixed by inversion and incubated at 
room temperature for 3 min. The sample was centri- 
fuged at 11,500 g at 4°C for 15 min. The aqueous phase 
containing RNA was transferred to a fresh tube and an 
equal volume of 2-propanol (Sigma, Ireland) was added. 
The sample was vortexed, incubated at room tempera- 
ture for ten min and then centrifuged at 11,500 g at 4°C 
for 10 min, resulting in the precipitation of the RNA 
out of solution in the form of a pellet. The supernatant 
was then removed, 70% ETOH was added, and the sam- 
ple was vortexed and centrifuged at 9,250 g at 4°C for 5 
min. The new supernatant was removed and the pellet 
was allowed to dry at room temperature. The final pellet 
was resuspended in 100 [i\ Elution Solution supplied 
with the GenElute™ Mammalian Total RNA kit (Sigma, 
Ireland), and RNA purified using the kit following the 
manufacturer's instructions. DNase treatment was per- 
formed using an On-column DNase (Sigma, Ireland) at 
a concentration of 70 [i\ of DNase Digest Buffer for On- 
column DNase Digestion to 10 [il of DNase I. Total 
RNA was quantified and integrity assessed on an Agilent 
2100 Bioanalyzer using the Agilent RNA 6000 Nano 
Chip kit. cDNA was synthesised for 3 \ig of RNA using 
the Evrogen SMART cDNA synthesis service (Evrogen, 
Russia). 5 [ig of cDNA from each sample was used to 
create a Roche 454 sequencing library and sequenced on 
a Roche 454 Life Sciences GS FLX Titanium Series 
sequencer. The six B. terrestris samples were sequenced 
independently (one sample per lane), and base calling 
and quality screening performed using standard Roche 
pipeline version 2.3. Roche 454 sff files were deposited 
in the SRA on the EBI, library ERP000936. 

Transcriptome assembly 

Roche 454 reads from the individual life cycle stages 
were combined to the purpose of assembly. We 
assembled the reads into a first draft transcriptome for 
B. terrestris following the approach of Kumar and Blax- 
ter [41]. Two distinct assemblers were used to generate 
primary assemblies: gsAssembler from Roche/454 Life 
Sciences (also known as Newbler, version 2.5.3; with set- 
tings "-urt", "-cDNA", "-vt SMARTAdaptors.fna") (the 
Newbler_454 assembly), and MIRA (version 3.0.2; [84,85] 
(the MIRA_454 assembly). For MIRA assembly (settings: 
"-job=denovo,est,normal, 454"), SMART adaptors were 
removed using BLAST and a custom perl script. 

The B, terrestris Sanger-sequenced expressed sequence 
tag (EST) data (generated by Sadd et al. [31]) and other 
pre-existing B. terrestris mRNAs in EMBL/GenBank 
were downloaded from EMBL nucleotide database (18th 



August 2010), and assembled using PartiGene [86] (the 
PG_Sanger assembly). 

We then coassembled the three contig sets (PG_San- 
ger, Newbler_454 and MIRA_454) using CAP3 [87] (with 
settings: sequence similarity 98% and base overlap 50 bp), 
to generate BT_transcriptome_vl contigs. Examination 
of the BT_transcriptome_vl contigs both affirmed that 
the assembly generated contig sequences of higher cred- 
ibility than from the single assemblers, but also identified 
issues of remaining redundancy in the data. We therefore 
reclustered the BT_transcriptome_vl contig set using 
PartiGene to generate the final B, terrestris transcriptome 
assembly, BT_transcriptome_v2, for annotation and ana- 
lysis. We mapped each of the input reads in the Roche 
454 data back to this final BT_transcriptome_v2 assembly 
using BLAST and assessed stage-specific expression of 
each contig by counting reads per contig for each source 
library using a custom Perl script. The vast majority of 
the total reads (1,441,743 ESTs or 92.4%) were mapped 
to 33,875 BT_transcriptome_v2 contigs. 

Assessment of completeness and integrity of the 
assembly 

We assessed the quality of BT_transcriptome_v2 by: (a) 
assessing congruence with available B, terrestris tran- 
scriptome shotgun assembly data from [32] (INDC 
Accession numbers: JI045408-JI025924.1) using BLAST 
[88]; and (b) assessing completeness compared to the 
whole-genome derived transcriptomes for other Hyme- 
noptera (the official gene set (OGS) for the honeybee A, 
mellifera (from [89]), and the ants C. floridanus [NCBI 
Genome Project ID: 50201] and H, saltator [NCBI Gen- 
ome Project ID: 50203]) and other arthropods (the OGS 
for D, melanogaster from the latest genome release 
[90]), using tBLASTx with an E-value cutoff of le-10. 

Functional Annotation 

We annotated the BT_transcriptome_v2 contigs with 
best BLAST matches by comparing them to the NCBI nr 
database (27th March 2011; reporting up to five matches 
with an E-value cutoff of le-06). BT_transcriptome_v2 
contigs were also annotated with Gene Ontology (GO), 
Enzyme Commission (EC) and Kyoto Encyclopaedia of 
Genes and Genomes pathways (KEGG) identifiers using 
annot8r (v.l.l.l;[91,92]) with a cut-off bit score of 55. 
GO terms were further summarised using the GO-SUm 
hierarchy. InterProScans were performed to infer puta- 
tive function for hypothetical and unannotated contigs 
using Blast2GO software tool (V.2.4.8; [93-95]). 

The annotated data were searched for genes, pathways 
and processes of interest. Gene lists were generated 
based on annotated processes and pathways for the 
eusocial honeybee, A. mellifera, the parasitoid jewel 



Colgan et al. BMC Genomics 201 1, 12:623 
http://www.biomedcentral.eom/1 471 -21 64/1 2/623 



Page 18 of 20 



wasp, N, vitripennis, the holometabolous fruit fly, D. 
melanogaster, and the hemimetabolous pea aphid. A, 
pisum. Protein and nucleotide sequences of genes of 
interest were obtained from NCBI and compared to the 
B. terrestris transcriptome data using BLAST. Gene 
names and accession numbers used for these compari- 
sons are given in Additional file 4. 

Comparison of expression between life stages of B. 
terrestris 

R-STAT analysis devised by Stekel et al. [42], was used 
to identify contigs in BT_transcriptome_v2 that varied 
in their proportional contribution from each of the six 
sequenced samples. R-STAT calculates a log likelihood 
ratio statistic that estimates variation of proportional 
contribution to each contig from each sample. The 
resulting statistic, the R-value, expresses the variation in 
read contribution to each contig across life cycle stages. 
We treat the results from this study as preliminary 
because, given the absence of replication within life 
cycle stages and castes, these categories are confounded 
with the age of the individual specimens. Statistical ana- 
lyses were carried out using the R language (version 
2.11.1;[96]). 
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